Using a Variety of n-Grams for the Detection of Different Kinds of Plagiarism Notebook for PAN at CLEF 2013

نویسندگان

  • Prasha Shrestha
  • Thamar Solorio
چکیده

A text can be plagiarised in different ways. The text may be copied and pasted word by word, parts of the text may be changed, or the whole text may be summarised into one or two lines. Different kinds of plagiarism require different strategies to detect them. But rarely do we know beforehand what type of plagiarism we are dealing with. In this paper we present a system that can detect verbatim plagiarism and obfuscated plagiarism with similar accuracy. Our system uses three different types of n-grams: stopword n-grams, n-grams with at least one named entity, and all words n-grams. After detecting and merging the detections to obtain passages, the system finds new detections incrementally based on the idea that the passages on the vicinity of plagiarised passages are more likely to be plagiarised. The system performs well on verbatim as well as obfuscated plagiarism cases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Alignment Module in CoReMo 2.1 Plagiarism Detector Notebook for PAN at CLEF 2013

This paper describes the process and basics of the Text Alignment Module into the CoReMo 2.1 Plagiarism Detector, which has won the Plagiarism Detection Text Alignment task in PAN-2013 edition, for both evaluation criteria of efficacy and efficiency, achieving the best detections and the best runtime too. Its high detection efficacy is mainly due to the special features of the contextual n-gram...

متن کامل

Diverse Queries and Feature Type Selection for Plagiarism Discovery Notebook for PAN at CLEF 2013

This paper describes approaches used for the Plagiarism Detection task in PAN 2013 international competition on uncovering plagiarism, authorship, and social software misuse. We present modified three-way search methodology for Source Retrieval subtask and analyse snippet similarity performance. The results show, that presented approach is adaptable in real-world plagiarism situations. For the ...

متن کامل

Developing Monolingual Persian Corpus for Extrinsic Plagiarism Detection Using Artificial Obfuscation: Notebook for PAN at CLEF 2015

The task of text alignment corpus construction at PAN 2015 competition consists of preparing a plagiarism corpus so that it can provide various obfuscation types and versatile obfuscation degrees. Meanwhile, its format and metadata structure should follow previous PAN plagiarism corpora. In this paper, we describe our approach for construction of a monolingual Persian plagiarism corpus that can...

متن کامل

Using Statistic and Semantic Analysis to Detect Plagiarism Notebook for PAN at CLEF 2013

This paper describes an approach submitted to the 2013 PAN competiton for the source retrieval sub-task. Three different methods for extracting queries were used, which employed tf-idf, noun phrases and named entities, in order to submit very different queries and maximize recall.

متن کامل

Using Hybrid Similarity Methods for Plagiarism Detection Notebook for PAN at CLEF 2013

At PAN2013 we decided to focus entirely on Text Alignment subtask. Following our previous experience at PAN2012 and CLINSS2012, we decided to put together the approaches we used in previous year to face the new challenges of PAN2013. This year competition added new way of plagiarism obfuscation via text summarization. This particular feature required represents a wide variety of typical cases o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013